Importance Sampling Estimates for Policies with Memory

نویسنده

  • Christian R. Shelton
چکیده

Importance sampling has recently become a popular method for computing off-policy Monte Carlo estimates of returns. It has been known that importance sampling ratios can be computed for POMDPs when the sampled and target policies are both reactive (memoryless). We extend that result to show how they can also be efficiently computed for policies with memory state (finite state controllers) without resorting to the standard trick of pretending the memory is part of the environment. This allows for very dataefficient algorithms. We demonstrate the results on simulated problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximating Bayes Estimates by Means of the Tierney Kadane, Importance Sampling and Metropolis-Hastings within Gibbs Methods in the Poisson-Exponential Distribution: A Comparative Study

Here, we work on the problem of point estimation of the parameters of the Poisson-exponential distribution through the Bayesian and maximum likelihood methods based on complete samples. The point Bayes estimates under the symmetric squared error loss (SEL) function are approximated using three methods, namely the Tierney Kadane approximation method, the importance sampling method and the Metrop...

متن کامل

Multi-step Off-policy Learning Without Importance Sampling Ratios

To estimate the value functions of policies from exploratory data, most model-free offpolicy algorithms rely on importance sampling, where the use of importance sampling ratios often leads to estimates with severe variance. It is thus desirable to learn off-policy without using the ratios. However, such an algorithm does not exist for multi-step learning with function approximation. In this pap...

متن کامل

Policy Improvement for POMDPs Using Normalized Importance Sampling

We present a new method for estimating the expected return of a POMDP from experi­ ence. The estimator does not assume any knowledge of the POMDP, can estimate the returns for finite state controllers, allows ex­ perience to be gathered from arbitrary se­ quences of policies, and estimates the return for any new policy. We motivate the estima­ tor from function-approximation and impor­ tance sa...

متن کامل

On the use of likelihood ratio as indicator of the accuracy of importance sampling estimates

This paper presents some observations made from experimenting with the use of importance sampling on large and small systems. The key point is to develop heuristics that enables the use of importance sampling even when the biasing strategy cannot be proven to be optimal or produce estimates with bounded relative error. The main observation is that the likelihood ratio and its relative error see...

متن کامل

Local Adaptive Importance Sampling for Multivariate Densities with Strong Nonlinear Relationships

We consider adaptive importance sampling techniques which use kernel density estimates at each iteration as importance sampling functions. These can provide more nearly constant importance weights and more precise estimates of quantities of interest than the SIR algorithm when the initial importance sampling function is di use relative to the target. We propose a new method which adapts to the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001